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ABSTRACT 

It  is  shown  that  deciding  whether  a  set  of  functional  dependencies 
and  one  join  dependency  implies  another  join  dependency  is  NP-complete. 
It  is  also  shown  that  deciding  whether  a  JD-rule  can  be  applied  to  a 
tableau  T  is  NP-complete.  This  problem  is  NP-complete  even  if  T  can  be 
obtained  from  a  tableau  corresponding  to  a  join  dependency  by  applying 
some  FD-rules.  As  a  result,  it  follows  that  computing  the  join  of 
several  relations  is  NP-hard. 


CR  categories:  4.33,  5.25 

Key  words  and  phrases:  functional  dependency,  multivalued  dependency, 
join  dependency,  join,  membership  algorithm,  NP-complete,  relational 
database. 
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J^.  Introduction 

The  relational  model  for  databases  [Cod]  uses  dependencies  as  a 
semantic  tool  for  expressing  constraints  that  the  data  must  satisfy. 
Functional  dependencies  and  join  dependencies  (that  include  multivalued 
dependencies  as  a  special  case)  are  examples  of  such  dependencies.  A 
utilization  of  these  dependencies  in  the  design  of  relational  databases 
depends  upon  the  ability  to  develop  membership  algorithms,  that  is, 
algorithms  for  deciding  whether  a  set  of  dependencies  £  implies  another 
dependency  a.  Several  efficient  membership  algorithms  are  known  if  all 
the  dependencies  are  functional  or  multivalued  [Bel, BeB, Gal, HIT, Sag] , 
and  an  exponential  time  and  space  algorithm  exists  for  functional  and 
join  dependencies  [MMS]. 

In  this  paper  we  show  that  if  a  is  a  join  dependency,  and  E  is  a 
set  of  functional  dependencies  and  one  join  dependency,  then  deciding 
whether  E  implies  o  is  NP-complete.  As  a  by-product  of  this  result,  we 
show  that  the  problem  of  deciding  whether  a  JD-rule  can  be  applied  to  a 
tableau  T,  and  the  problem  of  deciding  whether  a  relation  r  does  not 
obey  a  join  dependency  are  NP-complete.  The  first  problem  is  NP- 
complete  even  if  T  can  be  obtained  from  a  tableau  corresponding  to  a 
join  dependency  by  applying  some  FD-rules.  Another  by-product  is  a 
proof  that  deciding  whether  a  relation  r  is  not  the  join  of  relations 
r  ,...,r  is  NP-hard.  It  is  easy  to  give  examples  in  which  the  join  of 
r.,...,r  has  an  exponential  size  (measured  as  a  function  of  the  space 

needed  to  write  down  r.,...,r  ).   Therefore,  this  result  indicates  that 

In 

an  algorithm  for  computing  the  join  of  r.,...,r  whose  running   time   Is 
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polynomial  in  the  size  of  the  output  (I.e.,  the  space  needed  to  write 
down  the  join  of  r.,...,r  )  is  unlikely  to  exist.  A  similar  result  is 
given  in  [HLY].  However,  our  result  is  stronger,  since  we  assume  that 
r.,...,r  are  projections  of  some  universal  instance  I. 

A  recent  result  [Yan]  shows  that  if  a  is  a  functional  or  a  mul- 
tivalued dependency,  then  deciding  whether  a  set  Z  of  functional  and 
join  dependencies  implies  a  can  be  done  in  polynomial  time.  Thus,  the 
only  remaining  open  problem  is  to  find  a  lower  bound  on  the  complexity 
of  deciding  whether  a  set  of  join  dependencies  implies  another  join 
dependency.  It  is  interesting  to  note  that  the  only  known  algorithm  for 
the  more  restricted  problem  of  deciding  whether  a  set  of  multivalued 
dependencies  implies  a  join  dependency  is  exponential  in  time  and  space, 
and  there  is  no  known  lower  bound  [ABU] . 


2.    Basic  Definitions 

A  relation  is  a  two-dimensional  table  in  which  columns  correspond 
to  attributes,  and  rows  correspond  to  records  or  tuples.  Each  attribute 
has  an  associated  domain  of  values,  and  a  tuple  is  viewed  as  a  mapping 
from  the  attributes  to  their  domains.  If  r  is  a  relation,  p  is  a  tuple 
of  r,  and  X  is  a  set  of  attributes,  then  p[X]  denotes  the  values  of  \i  in 
the  X-columns.  A  set  of  attributes  labeling  the  columns  of  a  relation 
is  called  a  relation  scheme.  If  R  is  a  set  of  attributes  labeling  the 
columns  of  a  relation  r,  then  r  is  said  to  be  defined  on  R.  We  use  the 
letters   A,B,C,...    to   denote   attributes,    and    the    letters 
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. . . ,R, S, . . . ,X, Y,Z  to  denote  sets  of  attributes  (i.e.,  relation  schemes). 
A  set  of  attributes  is  written  as  a  string  attributes  (e.g.,  ABCD  is  the 
set  (A,B,C,D>),  and  the  union  of  sets  of  attributes  X  and  Y  is  written 
XY.  In  this  paper  we  assume  that  all  the  attributes  are  drawn  from  a 
universal  set  of  attributes  U. 

A  functional  dependency  (abbr.  FD)  [Arm, Cod]  is  a  statement  of  the 
form  X  >  Y,  where  both  X  and  Y  are  sets  of  attributes.  The  FD  X  +  Y 
holds  in  a  relation  r,  if  for  all  tuples  y  and  v  of  r,  if  y[X]  =  v[X], 
then  u[Y]  =  v[Y]. 


Let  R.,...,R  be  relation  schemes,  and   let   r  be  a   relation  on 
1      q 

q 

y  R .  .   Suppose  that  y .,...,  y  are  q  tuples  of  r  (not  necessarily  dis- 

i-i  i  i  q 

tinct) .   The  tuples  y, , . . . ,  y  are  joinable  on  R. ,  . . .  ,R  with  a  result  v, 
r     1      q     -J 1      q        

q 

if   there  exists   a  mapping  v  defined  on  U  R.  such  that  for  all  Ki<q, 

i=l 
y.[R  ]  =  v[R  ] .   A  loin  dependency  (abbr.  JD)  [Ris]  is  a  statement  of 

the   form  *[R,,...,R  ],   where  each  R.   is  a  relation  scheme.   The  JD 

1        i  *       , 

*[R.,...,R  ]  holds  in  a  relation  r  defined  on  y  R.  if  whenever   tuples 

q  i=l 

\i.t..,tM     of  r  are  ioinable  on  R.,...,R  with  a  result  v,  then  v  is  also 
1      q  J  1*    '  q  ' 

a  tuple  of  r.   The  JD  *[R.,...,R  ]  is  defined  on  the   relation  scheme 

1      q     — 

q 

y  r.  . 
i-i 

A  multivalued  dependency  (abbr.  MVD)  [BFH,Fag,Zan]  is  a  JD  with  at 
most  two  relation  schemes.  An  MVD  *[R.,R_]  is  also  written 
R.  ri  R  ++  R  (or  equivalently  R.  fi  R  ++  R  ).  Conversely,  the  MVD 
X  -*•-»■  Y  defined  on  U  can  be  written  as  the  JD  *[XY,XZ],  where 
Z  =  U  -  X  -  Y.   Both  FD's  and  MVD's  have  a  complete  set  of  inference 
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rules        [Arm,BFH] ,        and        polynomial        time  membership  algorithms 

[Bel,BeB,Gal,RTT,Sag]  .  An  MVD  X  •*■■*■  Y  holds  in  a  relation  r  if  and  only 
if  X  ■*■•*■  Y  -  X  holds  in  r  [Fag]  .  Therefore,  we  can  assume  that  in  an  MVD 
X  •♦"»■  Y  the  left  and   right   sides    (i.e.,    X  and  Y)   are  disjoint. 

Let  r.,...,r     be  relations  defined  on  R.,...,R   ,    respectively.      The 

n 

loin  of  r.,...,r  ,  written  *  r . ,  is 

1  i-1  ± 

<y   there  are  tuple  y.er.  (Ki<n)  such  that  y.,...,u 

II  in 

are  joinable  on  R.,...,R  with  a  result  y> 
1      n 

Let  £  be  a  set  of  JD's,  and  let  a  be  a  JD  or  an  FD.  We  assume  that 
all  the  JD's  are  defined  on  U.  The  dependency  a  is  a  consequence  of  £ 
(or  o  is  implied  by  £)  if  and  only  if  for  all  relations  r  on  U,  the 
dependency  o  holds  in  r  if  all  the  dependencies  of  £  hold  in  r. 

Let  £  be  a  set  of  dependencies.  A  convenient  way  of  representing 
all  the  MVD's  with  a  fixed  left  side  that  are  implied  by  £  is  by  con- 
structing the  dependency.  The  dependency  basis  of  a  set  of  attributes  X 
is  a  partition  of  U  into  pairwise  disjoints  subsets  of  attributes 

X, Y,,...,Y  such  that 
1      n 

(1)  X  ++  Y  is  implied  by  £  (Ki<n),  and 

(2)  if  X  >■>■  Y  is  implied  by  £,  then  Y  is  a  union  of  some  of  the  Y ,'s. 
The  existence  of  the  dependency  basis  follows  from  the  inference  rules 
for  MVD's  [Fag].   If  Z   contains  only  FD's  and  MVD's,  then  the  dependency 
basis  can  be  constructed  in  polynomial  time  [Bel, Gal, HIT, Sag] . 

A  ta't-leau  [ABU,ASU]  is  a  two-dimensional  matrix  in  which  columns 
correspond  to  attributes.   The  rows  of  a  tableau  consist  of  variables  of 
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the  following  types 

(1)  distinguished  variables,  usually  denoted  by  subscripted  a' s ,  and 

(2)  nondistinguished  variables,  usually  denoted  by  subscripted  b's. 

A  variable  cannot  appear  in  more  than  one  column,   and  in  each  column 
there  is  exactly  one  distingushed  variable. 

A  JD  *[R  ,...,R  ]  has  a  corresponding  tableau  T  as  follows.  For 
each  R  ,  tableau  T  has  a  row  w  with  distinguished  variables  exactly  in 
the  R  -columns,  and  distinct  nondistinguished  variable  in  the  rest  of 
the  columns .  We  can  also  view  a  tableau  as  a  relation  over  the  domain 
of  distinguished  and  nondistinguished  variables.  Note  that  rows 
w.,...,w  of  T  are  joinable  on  R.,...,R  ,  and  the  resulting  row  consists 
only  of  distinguished  variables. 

Example  1:  Consider  the  JD  *[AB,BCD,AD] .  The  tableau  T. 
corresponding  to  this  JD  is 

A  B  C  D 
\a^   a2  b1  b2 


jb3  a2  a3  a4 
'al  b4  b5  a4 


[] 


Let  E  be  a  set  of  FD's  and  JD's.  Each  dependency  in  E  has  an  asso- 
ciated rule  that  can  be  applied  to  any  tableau  T  as  follows. 

(1)  FD-Rules.  An  FD  X  -*•  Y  in  £  has  an  associated  rule  for  equating 
variables  of  T  as  follows.  Suppose  that  rows  w.  and  w»  of  T  agree  in 
all  the  X-columns,  but  disagree  in  an  A-column,  where  A  is  an  attribute 
of  Y.   If  one  of  w.  and  w?  has  a  distinguished  variable  in  its  A-column, 
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then  rename  the  two  rows  so  that  w.  is  that  row.  The  FD-rule  for  X  ♦  Y 
replaces  all  occurrences  of  the  variable  appearing  in  the  A-column  of  w~ 
with  the  variable  appearing  in  the  A-column  of  w. . 

(2)  JD-Rules.  A  JD  *[S. S  ]  in  E  has  an  associated  rule  for 

l  P 

adding  rows  to  T  as  follows.   If  rows  w. , . . .  ,w  of  T  are  joinable  on 

1      P 

S.,...,S  with  a  result  w,  and  w  is  not  already  in  T,  then  w  is  added  to 

T. 


Each  one  of  The  above  rules  transforms  a  tableau  T  to  another 
tableau  T'.  The  rules  can  be  applied  repeatedly  to  a  tableau  T  only  a 
finite  nuber  of  times,  and  the  result  is  unique  (up  to  renaming  of  non- 
distinguished  variables)  [MMS] .  The  chase  of  T  under  Z,  denoted 
chasej,(T),  is  the  tableau  obtained  by  applying  the  rules  associated  with 
£  to  T  until  no  rule  can  be  applied  anymore.  Let  a  be  a  JD  with  a 
corresponding  tableau  T  .  The  JD  a  is  a  consequence  of  I  if  and  only  if 
chase_(T  )  contains  a  row  consisting  only  of  distinguished  variables 
[MMS]. 

Example  2;  Let  Z  =  {*[AB,BCD,  ABD] ,  A  ♦  B,  C  ♦  A},  and  let  a  be  the 
JD  *[AB,BCD,AD]  whose  corresponding  tableau  is  given  in  Example  1.  The 
FD-rule  for  A  ■*■  B  can  be  applied  to  the  first  and  third  rows  of  the 
tableau  in  Example  1,  and  hence  b,  is  identified  with  a~.  The  resulting 
tableau  is 
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A  B  C   D 


ax   a2  bl    b2 

,b3  a2  a3  a4 
al  a3  b5  a4 


The  first,  second,  and  third  rows  of  the  above  tableau  are  joinable  on 
AB,BCD,ABD  with  a  result  (a.  ,a_,a_,a, ) .  Thus,  applying  the  JD-rule  for 
*[AB,BCD,ABD]  produces  the  tableau 

A  B  C  D 


al  a2  bl  b2 
b3  a2  a3  a4 

al  a2  b5  a4 


al  a2  a3  a4 


Applying  the  FD-rule  for  C  ■*■  A  to  the  second  and  fourth  rows  of  the 
above  tableau  identifies  b_  with  a..  As  a  result  the  second  row  becomes 
identical  to  the  fourth  row,  and  hence  it  can  be  omitted.  The  resulting 
tableau  is 

A  B  C  D 


al  a2  bl  b2 
al  a2  b5  a4 


al  a2  a3  a4 


No  rule  for  E  can  be  applied  to  the  above  tableau    [] 
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3^  NP-Completeness  Results  Concerning  Join  Dependencies 
_3 ._1  Boolean  Expressions  and  Tableaux 

All  the  results  use  almost  the  same  reduction  from  the  3-satisfia- 

blllty  (3-SAT)  problem,  shown  NP-complete  In  [C] ;  see  also  [K,GJ].   Let 

Q  ■  F....F  be  a  Boolean  expression  In  conjunctive  normal  form,  where 

the  F. 's  are  clauses  of  three  literals  each,  and  x.,x„,...,x  are  all 
j  1  2      n 

the  variables  appearing  in  this  expression.   We  denote  the  variables 

appearing  in  a  clause  F.  by  x.  ,  x.  ,  and  x  .   We  assume  that  n>4  (and 

J     Jl        32  J3 

hence  m>l),  and  each  variable  appears  in  at  least  two  clauses.  Note 
that  if  n<3,  then  the  satisfiability  of  Q  can  be  decided  in  linear  time; 
and  if  a  variable  appears  in  only  one  clause,  then  this  clause  is  always 
satisfied  and,  hence,  it  can  be  omitted.  Thus,  this  variant  of  the 
3-SAT  problem  is  NP-complete. 

We  now  show  how  Q  is  used  to  construct  two  tableaux  that  correspond 
to  join  dependencies.  These  tableaux  are  similar  to  those  used  in  the 
NP-completeness  proofs  given  in  [ASU] .  Each  one  of  them  has  (nri-3n+2) 
columns.  The  first  m  columns  correspond  to  the  clauses  F.,...,F  ,  and 
they  are  labeled  by  the  attributes  E.,...,E  .  The  next  3n  columns  are 
divided  into  three  blocks  of  n  columns  each.  The  n  columns  in  each 
block  correspond  to  the  variables  x. , . . .  ,x  .  The  columns  of  the  three 
blocks  are  labeled  by  A  's,  B  's,  and  C  's,  respectively.  The  last  two 
columns  are  labeled  by  D-  and  D_.  The  first  tableau,  denoted  by  S, 
represents  the  m  clauses.  For  each  clause  F  containing  the  variables 
x   ,  x  .  and  x  ,  tableau  S  has  a  row  s  as  follows.  Row  s.  has  dis- 

J  1     J  9  J  o  J  J 

tinguished  variables  in  the  columns  for  E  ,  A.  ,  A  ,  A  ,  and  D..   All 

J    3l        32        J3 
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the  other  columns  have  distinct  nondistnigushed  variables.  The  tableau 
S  has  one  more  row,  denoted  by  s  .,  with  distinguished  variables  in  all 
the  E,  B,  C,  and  D«  columns  (the  rest  of  the  columns  have  distinct  non- 
distinguished  variables) .  Let  S  be  the  relation  scheme  corresponding 
to  row  s  of  S  (Kj<m+1).  That  is,  S  contains  all  the  attributes 
labeling  columns  in  which  s  has  distinguished  variables.   Thus,  the 

tableau  S  corresponds  to  the  JD  *[S .,..., S   , ] . 

j      m+i 

The  second  tableau,  denoted  by  T,  represents  truth  assignments 
under  which  clauses  of  Q  are  true.  For  every  F  (Kj<m),  tableau  T  has 
seven  rows  that  represent  all  the  truth  assignments  under  which  F  is 
true.  If  C  is  a  truth  assignment  under  which  F  is  true,  then  T  con- 
tains a  row  w  as  follows.   For  Ki<3,  if  x.   is  assigned  1  under  £,   row 

Ji 
w  has  a  distinguished  variable  in  the  B  -column;  otherwise,  w  has  a 

Ji 
distinguished  variable  in  the  C  -column.   Row  w  has  distinguished  vari- 

Ji 
ables  also  in  the  E  -column  and  in  the  D. -column.   The  tableau  T  has  two 

additional  rows,  denoted  by  u  and  v.   Row  u  has  distinguished  variables 

in  all  the  E,  B,  C,  and  D.  columns  (excatly  as  row  s  ..  of  S).   Row  v 

I  m+l 

has  distinguished  variables  in  all  the  A  and  D  columns.  All  the  other 
columns  of  rows  of  T  contain  distinct  nondistinguished  variables. 


Example  3;  Consider  the  Boolean  expression 

(x.  +  x_  +  x-)(x.  +  x.  +  x,)(x.  +  x2  +  x,). 
By  a  slight  abuse  of  notation,  we  denote  the  distinguished  variable  in 
each  column  by  an  a  (without  a  subscript) .   The  dots  stand  for  distinct 
nondistinguished  variables.   The  tableau  S  Is 
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El  E2  E3  Al  A2  A3  A4  Bl  B2  B3  B4  Cl  C2  C3  C4  Dl  °2 


sl 
s2 
s3 
s4 


|a 

• 

• 

a 

a 

a 

a 

•   I 

1    • 

a 

• 

a 

• 

a 

a 

a 

•   | 

1    • 

• 

a 

a 

a 

• 

a 

a 

•  I 

u 

a 

a 

• 

• 

• 

• 

a 

a 

a 

a 

a 

a 

a 

a 

• 

a| 

The  tableau  T  is  given  in  Figure  1.    [] 


Let  Z  be  a  set  of  dependencies  that  consists  of  the  JD 
*[S.,...,S  ],  and  the  FD's  B  D.  ■*■  A  ,  CD.  +  A  ,  and  D.D-  ■*■  A  (for 
Ki<n);  and  let  a  be  the  JD  corresponding  to  the  tableau  T. 


We  will  show  that  a  is  a  consequence  of  E  if  and  only  if  Q  is 
satisfiable.  The  proof  is  an  analysis  of  the  computation  of  chase  (T). 
Since  the  rules  associated  with  E  can  be  applied  to  T  in  any  order,  we 
start  by  applying  the  FD- rules.  The  FD-rules  for  D,D2  >  A  (Ki<n)  can- 
not be  applied,  since  no  two  rows  of  T  agree  in  the  columns  for  D.  and 
D^.  The  application  of  the  other  FD-rules  modifies  only  the  A-columns 
of  T.  Note  that  rows  u  and  v  of  T  are  not  affected  by  this  modifica- 
tion. After  all  possible  applications  of  FD-rules  to  T,  each  A  -column 
is  going  to  have  exactly  two  repeated  nondistinguished  variables,  say 
b  and  b  (Ki<n).  The  variable  b  results  from  the  application  of  the 
FD-rules  for  B  D.  ■*■  A  ,  and  can  be  viewed  as  representing  the  truth 
value   1.    The  variable  b   results  from  the  application  of  the  FD-rules 


fo 


r  C  D.  ■»•  A  ,  and  can  be  viewed  as  representing  the  truth  value  0.    A 


row  w  of  T  representing  a  truth  assignment  for  a  clause  F  (with  vari- 
ables x  ,  x  ,  and  x   )  is  going  to  have  b   in  the  A  -column,  if  x 

Jl   J2       J3  Ji_        Ji  Ji 

is   true;   otherwise,   it   is  going  to  have  b   in  this  column  (Ki<n). 

Ji 


(1)  A  variable  is  repeated  if  it  appears  in  more  than  one  row. 
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El  E2  E3  Al  A2  A3  A4  Bl  B2  B3  B4  Cl  C2  C3  C4  Dl  D2 


ll 

I 

1                  •                  4 

• 

• 

a     a     a 

• 

a 

|l 

I 

1                  •                  4 

a 

• 

a     a 

• 

a 

\i 

1 

a     a 

• 

a      . 

• 

• 

a 

\l 

t 

a 

»            • 

• 
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Figure  1 

Thus,  the  truth  assignment  represented  by  w  is  now  given  in  the  A- 
column8  of  w.  It  is  easy  to  show  that  no  further  applications  of  FD- 
rules  are  possible.  Let  T'  be  the  tableau  obtained  by  applying  the  FD- 
rules  to  T. 


Lemma  1 :  Suppose  that  rows  w.,...,w  . ,   of  T"   are  joinable  on 

i      m+l 

S.,...,S  .,  with  a  result  w,  and  w  is  not  in  T.   Then  w  ,,  is  u,  and  for 
1      nn-i  m+l 

all  Kj<m,  row  w  is  a  row  of  T  representing  a  truth  assignment  for  F  . 


""roof :  If  all  the  w  's  are  Identical,  then  w  is  the  same  row  as  the 
w  's  and,  hence,   it  is  in  T' .   Therefore,  it  suffices  to  show  that  if 
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either  w . .  is  not  u  or  some  w.  (j*m+l)  is  not  a  row  representing  a 
truth  assignment  for  F . ,  then  all  the  w.'s  are  identical. 

Claim  It  If  row  w  .  .   or   row  w.   (for  some   Ki<m)   has   in  the 

E  -column  a  nondistinguished  variable  that  appears  nowhere  else  in  T' , 

then  w.  and  w  ..  are  identical, 
i      m+1 

Claim  1  follows  from  the  fact  that  for  all  Ki<m,  rows  w.  and  w.  , 

1       m+l 

agree  in  the  E  -column,  because  both  S.  and  S  .,  contain  E. . 
°  i  i      m+l         i 

Claim  2;  If  some  w   (j*m+l)  is  u,  then  for  all  Kl<m,  row  w  is  u. 

Claim  2  follows  from  the  fact  that  for  all  Ki<m,  the  relation 
scheme  S  contains  the  attribute  D  ,  and  u  has  in  the  D. -column  a  non- 
distinguished  variable  appearing  nowhere  else  in  T'. 


Suppose  that  w  . .  is  v.   But  v  has  in  each  E  -column  a  distinct 
m+l  1 

nondistinguished  variables  appearing  nowhere  else  in  T' ,  and  so  by  Claim 

1,  every  w.  is  v.   So  suppose  that  w. .  is  a  row  representing  a  truth 
l  m+l 

assignment  for  some  F,  .  Therefore,  row  w  .  has  a  distinguished  vari- 
able in  the  E, -column,  and  in  all  the  other  E-columns  it  has  distinct 
nondistinguished  variables  appearing  nowhere  else  in  T'.   By  Claim  1, 

for  all  i*k,  row  w.  and  w  , ,  are  identical.   Row  w.   must  have  a  dis- 
i      m+l  k 

tinguished  variable  in  the  E, -column,  since  w  .  has  a  distinguished 
variable  in  this  column  and  both  S,  and  S  .  contain  E,  .  By  Claim  2, 
row  w,  cannot  be  u,  because  there  is  a  row  w  (j*m+l)  that  is  not  u 
(since  i>l).  Thus,  all  the  w.'s  (i*k)  are  equal  to  a  row  of  T' 
representing  a  truth  assignment  for  F,  ,  and  w,  is  also  a  row  represent- 
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ing  a  truth  assignment  for  F.  .  But  every  variable  x  appears  in  more 
than  one  clause  and,  hence,  the  pattern  of  the  distinguished  variables 
in  the  A-columns  of  tableau  S  implies  that  w,  represents  the  same  truth 
assignment  as  all  the  other  w  's.   That  is,  all  the  w  's  are  identical. 

So  far  we  have  shown  that  if  w  ..  is  not  u,  then  all  the  w.'s   are 

m+i  l 

identical.   Now  suppose  that  some  w.  is  not  a  row  representing  a  truth 

assignment  for  F  .   If  w  is  u,  then  Claim  2  implies  that  for  all  Ki<m, 

row  w.  is  u.   But  w  . .  is  also  u,  and  so  all  the  w.'s  are  identical.   If 
i  m+1  i 

w  is  either  v  or  a  row  representing  a  truth  assignment   for  some  F, 
J  fc 

( j*k) ,  then  w  has  in  the  E  -column  a  nondistinguished  variable  appear- 
ing nowhere  else  in  T' ,  and  so  by  Claim  1,  rows  w.  and  w  ..  are  identi- 

j      m+l 

cal.   But  w  ,,  is  u,  and  so  Claim  2  implies  that  all  the  w.'s  are  ident- 

m+1  r  i 

ical.    [] 

Corollary  2;  Rows  w. , . . . ,w  .  of  T'   are  joinable  on  S.,...,S   . 
with  a  result  w  not  in  T'  if  and  only  if  Q  is  satisfiable. 


Proof:  Only  if.   By  Lemma  1,  row  w   (Kj<m)  represents  the   follow- 
ing  truth  assignment   for  F. .    If   x.    is  a  variable  of  F  ,  and  the 

J         J  j  J 

A.  -column  of  w  has  the  repeated  nondistinguished  variable  b   ,   then 
J  j  J  J  i 

x.    is  assigned   1.    If  the  A.  -column  of  w,  has  the  repeated  nondis- 

J  j  J  j  J 

tinguished  variable  b   ,  then  x    is  assigned  0.    Under   this   truth 

Ji        Ji 
assignment   F   is  true.   But  the  pattern  of  the  distinguished  variables 

in  the  A-columns  of  tableau  S  implies  that  in  this  case  there  is  a  truth 

ass^  nment  i>     for  all  the  variables  x.,...,x   such  that  for  all  Kj<m, 

In 

the  truth  assignment  ^  agrees  with  the  truth  assignment   represented  by 
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w  on  the  variables  of  F . .   Hence,  each  P.  is  true  under  ty,      and  Q  is 
satisf iable. 


If.   Suppose  that  t  is  a  truth  assignment  that  satisfies  Q.    For 

all  Kj<m,  let  w  be  the  row  of  T'  representing  the  truth  assignment  for 

F.  that  agrees  with  i|>  on  the  variables  of  F . ;  and  let  w  .,  be  row  u  of 
J  J  ™+l 

T'.    It   is  easy  to  show  that   the  rows  v. v  ..  are  joinable  on 

S.,...,S   .  with  a  result  w  not  in  T'.    [] 

Lemma  3;  The  JD  a   (corresponding  to  T)  is  a  consequence  of  Z  if  and 
only  if  Q  is  satisf iable. 


Proof:  Only  if.  By  Corollary  2,  if  Q  is  not  satisf iable,  then  the 
only  JD-rule  for  E  cannot  be  applied  to  T' .  Therefore,  chase_(T)  is  the 
result  of  applying  the  FD-rules  to  T,  i.e.,  chase_(T)  »  T'.  This  chase 
does  not  contain  a  row  with  only  distinguished  variables  and,  hence,  o 
is  not  a  consequence  of  £. 

If .  Suppose  that  Q  is  satisfiable.  By  Lemma  1  and  Corollary  2,  an 
application  of  the  JD-rule  for  Z  to  T'  adds  a  row  w  that  has  dis- 
tinguished variables  in  all  the  E,  B,  C,  and  D  columns.  We  can  apply 
the  FD-rules  for  D.D2  +  A.  (Kj<n)  to  w  and  v  (the  last  row  of  T'),  and 
the  result  is  a  row  with  only  distinguished  variables.  Thus,  a  is  a 
consequence  of  E.    [] 
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_3. 2   NP-Completeness  Results  Concerning  Applications  of  JD-Rules  and 
Testing  Whether  Relations  Obey  Join  Dependencies* 

Theorem  4:  The  problem  of  deciding  whether  a  JD-rule  can  be  applied 
to  a  tableau  U  is  NP-complete.  This  problem  is  NP-complete  even  if  U 
can  be  obtained  from  a  tableau  corresponding  to  a  JD  by  applying  some 
FD-rules. 

Proof:  At  first  we  will  show  that  the  problem  is  in  NP.   Suppose  we 

have  to  decide  whether  the  JD-rule  for  a  JD  *[R.,...,R  ]  can  be  applied 

1     q 

to  a  tableau  U.  We  nondeterministically  choose  q  rows  w.,...,w  of  U, 
and  check  in  polynomial  time  whether  they  are  joinable  on  R.,...,R  with 
a  result  w  not  in  U. 

To  show  that  the  problem  is  complete  in  NP,  the  3-SAT  problem  can 

be  reduced  to  this  problem  as  described  in  Section  3.1.   That  is,  given 

an  instance  Q  of  the  3-SAT  problem,  we  construct  the  JD  *[S .,..., S  ]  and 

1      m 

the  tableau  T.  By  applying  some  FD-rules  to  T,  we  obtain  the  tableau 
T'.  By  Corollary  2,  the  JD-rule  for  *[S .,..., S  ]  can  be  applied  to  T' 
if  and  only  if  Q  is  satisfiable.    [] 

Corollary  5:  It  is  NP-complete  whether  a  JD  *[R.,...,R  ]  does  not 
hold  in  a  relation  r. 

Proof:  The  problem  is  in  NP,  since  we  can  nondeterministically  find 
q  tuples  of  r  that  are  joinable  on  R.,...,R  with  a  result  that  is  not  a 
tuple  of  r. 
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To  show  that  the  problem  is  complete  In  NP,  we  can  view  the  tableau 
T'  as  a  relation  r  (by  replacing  each  variable  with  a  distinct  con- 
stant) .   By  Corollary  2,  the  JD  *[S ,,...,  S  ]  does  not  hold  in  r   if  and 

i      m 

only  if  Q  is  satisfiable.    [] 


3. •  A  Mi  NP-Completeness  Result  for  Inferring  Join  Dependencies 

Theorem  6;  Let  T  be  a  set  of  FD's  and  one  JD,  and  let  Y  be  another 
JD.  The  problem  of  deciding  whether  Y  is  a  consequence  of  V  is  NP- 
complete. 

Proof;  Let  *[R.,...,R  ]  be  the  only  JD  in  T.  At  first  we  show  that 
the  problem  is  in  NP.  Let  U  be  a  tableau  and  suppose  that  chaser(U)  can 
be  obtained  from  U  by  using  only  the  JD-rule  for  T.  The  following  claim 
shows  that  any  row  of  chaser(U)  (that  is  not  in  U)  can  be  obtained  by  a 
single  application  of  the  JD-rule  for  T   to  some  rows  of  U. 

Claim  1:  If  a  tableau  U'  is  obtained  by  repeatedly  applying  the 
JD-rule  for  T  to  a  tableau  U,  then  any  row  of  U'  is  the  result  of  join- 
ing some  rows  of  U  on  R. R  . 

In  order  to  prove  this  claim,  suppose  that  the  JD-rule  for  T  is 
applied  only  to  the  original  rows  of  U  until  it  cannot  be  applied 
anymore.  Let  the  resulting  tableau  be  U.  It  suffices  to  show  that  the 
JD-rule  for  T  cannot  be  applied  to  U.  So  suppose  that  the  JD-rule  can 
be  appl  ^d  to  U.  That  is,  there  are  rows  w.,...,w  of  U  that  are  join- 
able  on  R,,...,R  with  a  result  w  not  in  U.  If  some  w.  is  not  in  U, 
1'      q  i 
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then  there  are  rows  v.,...,v  in  U  that  are  joinable  on  R.,...,R  with  a 

1      q  1      q 

result  w  .   But  w.   and  v  agree  on  R.  and,  hence,  w  can  be  replaced 

with  v..   That  is,  w.,...,w.  . ,v.  ,w. , , , . .  .,w  are  Joinable  on  R.,...,R 
i  1      i-1   I  i+1      q     J  1      q 

with  a  result  w.   It   follows  that  every  w  that  is  not  In  U  can  be 

replaced  with  some  row  in  U,  and  the  resulting  rows  are  Joinable  on 

R,,...,R  with  a  result  w.   Therefore,  w  is  also  in  U. 
1      q 

Now  suppose  that  no  FD-rule  for  T  can  be  applied  to  a  tableau  U, 
but  some  FD-rules  for  T  can  be  applied  to  a  tableau  U' ,  where  U'  is 
obtained  from  U  by  applying  the  JD-rule  for  T  several  times*  That  is, 
there  are  rows  v  and  w  of  U'  such  that  some  FD-rule  for  T  can  be  applied 
to  v  and  w.  By  Claim  1,  rows  v  and  w  can  be  generated  by  applying  the 
JD-rule  to  rows  of  U  (unless  they  are  already  in  U) .  By  using  a  non- 
deterministic  algorithm,  rows  v  and  w  can  be  obtained  in  polynomial  time 
in  no  more  than  two  applications  of  the  JD-rule  for  V,  Therefore,  in 
order  to  generate  any  row  of  chase_(U),  we  can  always  find  a  sequence  of 
applications  of  the  rules  for  T  in  which  the  JD-rule  is  never  used  more 
than  twice  in  a  row.  Let  n  be  the  number  of  distinct  variables  in  U. 
Each  application  of  an  FD-rule  reduces  the  number  of  distinct  variables 
by  one.  Thus,  the  FD-rules  can  be  applied  to  U  no  more  than  n  times. 
Since  each  application  of  an  FD-rule  Is  preceded  by  no  more  than  two 
applications  of  the  JD-rule  for  T,  we  can  generate  any  row  of  chaser(U) 
in  0(n)  applications  of  the  rules  for  T.  In  particualr,  we  can  use  a 
nondeterministic  algorithm  to  generate  the  row  consisting  only  of  dis- 
tinguished variables  (if  this  row  is  indeed  in  chaser(U))  in  0(n)  appli- 
cations of  the  rules  for  T. 
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The  following  is  a  nondeterministic  polynomial  time  algorithm  that 
returns  "Yes"  if  Y  is  a  consequence  of  T.  The  tableau  for  Y  is  denoted 
by  V. 

(1)  Nondetermini8tically  create  two  rows  v.  and  v.  such  that  each  v   is 
either   a  row  of  V  or  can  be  obtained  by  joining  some  rows  of  V  on 

1      q 

(2)  If  either  v.  or  v2  consists  only  of  distinguished  variables,   then 

return  "Yes". 

(3)  Add  v.  and  v2  to  V  (if  they  are  not  already  there). 

(4)  Apply  the  FD-rules  to  V  until  no  FD-rule  can  be  applied.    If  at 
least  one  FD-rule  has  been  applied,  then  go  to  (1). 

Steps  (l)-(3)  require  nondeterministic  linear  time.  Step  (4) 
requires  (deterministic)  polynomial  time  [ABU].  Each  application  of  an 
FD-rule  reduces  the  number  of  distinct  variables  in  V  by  one,  and  Step 
(1)  is  repeated  only  after  an  application  of  some  FD-rule.  Therefore, 
no  more  than  0(n)  rows  are  added  to  V,  and  the  algorithm  has  a  nondetem- 
inistic  polynomial  running  time. 

It  remains  to  be  shown  that  the  problem  is  NP-complete.  The  3-SAT 
problem  can  be  reduced  to  this  problem  as  described  in  Section  3. 1,  and 
the  NP-completeness  follows  from  Lemma  3.    [] 
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.3»A  An  NP-Hard  Result  for  Computing  the  Join  of  Several  Relations 

In  this  section  we  show  that  computing  the  join  of  several  rela- 
tions is  a  hard  problem  (even  If  the  relations  come  from  a  universal 
instance) .  We  assume  familiarity  with  the  definition  of  the  join  opera- 
tor, and  the  correspondence  between  tableaux  and  relational  expressions 
(cf.  [ASU] ) .  It  should  be  noted  that  a  similar  result  Is  stated  in 
[HLY].  However,  our  result  Is  stronger,  since  we  assume  that  the  rela- 
tions are  obtained  by  projection  from  a  universal  instance. 

Theorem  7:  Let  E  be  a  relational  expression  with  the  join  as  the 
only  operator,  let  I  be  a  universal  instance,  and  let  r  be  a  relation. 
The  problem  whether  E(I)  *  r  is  NP-hard.  (E(I)  is  the  value  of  the 
expression  E  for  the  Instance  I.) 

Proof:  We  can  view  the  tableau  T'  of  Section  3. 1   as  a  universal 

instance,   and   the  tableau  S  as  representing  the  relational  expression 

m 

*  S  .   Thus  the  3-SAT  problem  can  be  reduced  to   this  problem  in   the 
i-1 
following  way.    Given  a   Boolean  expression  Q,  we  construct  the  rela- 

m 
tional  expression  *  S  corresponding  to  the  tableau  S  of   Section  3. 1. 

i-1 
The  instance  I  is  obtained  from  the  tableau  T'  by  replacing  each  vari- 
able of  T'  with  a  distinct  element  from  the  domain  of  the  corresponding 

attribute.    The  relation  r  is  the  same  as  the  instance  I.   By  Corollary 

m 
2,  the  relation  r  is  not  the  value  of   *  S.  for  I  if  and  only  if  Q  is 

i-1 
satisfiable.    [] 
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